using the RATS data
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.0.3
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyr)
## Warning: package 'tidyr' was built under R version 4.0.3
RATSL <- read.csv("C:/Users/susac/Documents/R/IODS-project/data/RATSL.csv", sep = ",", header = TRUE, row.names = 1)
glimpse(RATSL)
## Rows: 176
## Columns: 5
## $ ID <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1, 2,...
## $ Group <int> 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 1, 1, 1, 1, ...
## $ WD <chr> "WD1", "WD1", "WD1", "WD1", "WD1", "WD1", "WD1", "WD1", "WD1...
## $ Weight <int> 240, 225, 245, 260, 255, 260, 275, 245, 410, 405, 445, 555, ...
## $ Time <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 8, 8, 8, 8, ...
dim(RATSL)
## [1] 176 5
RATSL$ID <- factor(RATSL$ID) #factor again
RATSL$Group <- factor(RATSL$Group)
glimpse(RATSL)
## Rows: 176
## Columns: 5
## $ ID <fct> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1, 2,...
## $ Group <fct> 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 1, 1, 1, 1, ...
## $ WD <chr> "WD1", "WD1", "WD1", "WD1", "WD1", "WD1", "WD1", "WD1", "WD1...
## $ Weight <int> 240, 225, 245, 260, 255, 260, 275, 245, 410, 405, 445, 555, ...
## $ Time <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 8, 8, 8, 8, ...
The Rats data set encompasses the weight of the rats, the weekdays, the ID number and the group number. WE have 3 groups of rats getting a different diet.
visualization of the data
## Warning: package 'ggplot2' was built under R version 4.0.3
## <ScaleContinuousPosition>
## Range:
## Limits: 225 -- 628
The first graph shows that the rats in group 1 compared to group 2 and 3 have a lower start weight. In group 2 we have rat 12 which seems to be an outlier because its weight more than the other rats. Also in group 1 (rat 2) and group 3 (rat 13) have a different weight compared to the other rats within the group. In this graph it seems that in group 1 all mice gain weight, as well in group 2 and 3.
visualization of standardized data
## Rows: 176
## Columns: 6
## $ ID <fct> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1,...
## $ Group <fct> 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 1, 1, 1, ...
## $ WD <chr> "WD1", "WD1", "WD1", "WD1", "WD1", "WD1", "WD1", "WD1", "...
## $ Weight <int> 240, 225, 245, 260, 255, 260, 275, 245, 410, 405, 445, 55...
## $ Time <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 8, 8, 8, ...
## $ stdweight <dbl> -1.0011429, -1.1203857, -0.9613953, -0.8421525, -0.881900...
The standardized plot shows now, that in group 1 the overall weight is stable, of course within subject differences. In group 2 we have a weight gain except 1 rat, which is loosing weight. In group 3 it is 50:50 of weight gain.
visualization of a summary
n <- RATSL$Time %>% unique() %>% length()
RATSS <- RATSL %>%
group_by(Time, Group) %>%
summarise( mean = mean(Weight), se = sd(Weight)/sqrt(n) ) %>%
ungroup()
## `summarise()` regrouping output by 'Time' (override with `.groups` argument)
ggplot(RATSS, aes(x = Time, y = mean, shape = Group, linetype= Group)) +
geom_line() +
scale_linetype_manual(values = c(1,2,3)) +
geom_point(size=3) +
scale_shape_manual(values = c(1,2,3)) +
geom_errorbar(aes(ymin = mean - se, ymax = mean + se, linetype="1"), width=0.3) +
theme(legend.position = c(0.9,0.5)) +
scale_y_continuous(name = "mean(Weight) +/- se(Weight)")
This plot shows the mean weight gain of each group against the time. Also here the graphs shows that the start weight is different between the groups. The weight gain of group 1 seems to be the slowest on, whereas group 2 has the fastest weight gain. Group 3 is between the other both groups regarding weight gain.
Applying the summary approach
library(dplyr)
library(tidyr)
RATSL8S <- RATSL %>%
filter(Time > 0) %>%
group_by(Group, ID) %>%
summarise( mean=mean(Weight) ) %>%
ungroup()
## `summarise()` regrouping output by 'Group' (override with `.groups` argument)
glimpse(RATSL8S)
## Rows: 16
## Columns: 3
## $ Group <fct> 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3
## $ ID <fct> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16
## $ mean <dbl> 261.0909, 237.6364, 260.1818, 266.5455, 269.4545, 274.7273, 2...
# visualization with outliers
ggplot(RATSL8S, aes(x = Group, y = mean)) +
geom_boxplot() +
stat_summary(fun = "mean", geom = "point", shape=23, size=4, fill = "green") +
scale_y_continuous(name = "mean(weight), days 1-64")
#visualization without 1 big outlier for group 2
RATSL8S1 <- RATSL8S %>%
filter(ID !=2& ID !=12 & ID !=13)
ggplot(RATSL8S1, aes(x = Group, y = mean)) +
geom_boxplot() +
stat_summary(fun.y = "mean", geom = "point", shape=23, size=4, fill = "blue") +
scale_y_continuous(name = "mean(weight), days 1-64 without outlier")
## Warning: `fun.y` is deprecated. Use `fun` instead.
This boxplot shows the overall weight of each group. We can see that all groups have an outlier. Especially group 2 has a huge one. (graph 2 shows the group without outlier in group 2). As already assumed, all three groups have a different total weight with the highest weight in group 3.
t-Test
Because we have 3 groups we cannot use t-test, because for a t-test it has to be exactly 2 groups. Therefore, we use the anova test.
Variance analysis
# only used the long wide data, and I don't know yet how to convert them in wide form again, I sued the original data again
RATS<-read.table("https://raw.githubusercontent.com/KimmoVehkalahti/MABS/master/Examples/data/rats.txt", sep="", header=TRUE)
RATS$ID <- factor(RATS$ID) #factor again
RATS$Group <- factor(RATS$Group)
glimpse(RATS)
## Rows: 16
## Columns: 13
## $ ID <fct> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16
## $ Group <fct> 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3
## $ WD1 <int> 240, 225, 245, 260, 255, 260, 275, 245, 410, 405, 445, 555, 4...
## $ WD8 <int> 250, 230, 250, 255, 260, 265, 275, 255, 415, 420, 445, 560, 4...
## $ WD15 <int> 255, 230, 250, 255, 255, 270, 260, 260, 425, 430, 450, 565, 4...
## $ WD22 <int> 260, 232, 255, 265, 270, 275, 270, 268, 428, 440, 452, 580, 4...
## $ WD29 <int> 262, 240, 262, 265, 270, 275, 273, 270, 438, 448, 455, 590, 4...
## $ WD36 <int> 258, 240, 265, 268, 273, 277, 274, 265, 443, 460, 455, 597, 4...
## $ WD43 <int> 266, 243, 267, 270, 274, 278, 276, 265, 442, 458, 451, 595, 4...
## $ WD44 <int> 266, 244, 267, 272, 273, 278, 271, 267, 446, 464, 450, 595, 5...
## $ WD50 <int> 265, 238, 264, 274, 276, 284, 282, 273, 456, 475, 462, 612, 5...
## $ WD57 <int> 272, 247, 268, 273, 278, 279, 281, 274, 468, 484, 466, 618, 5...
## $ WD64 <int> 278, 245, 269, 275, 280, 281, 284, 278, 478, 496, 472, 628, 5...
RATSL8S2 <- RATSL8S %>%
mutate(baseline = RATS$WD1)
fit <- lm(mean ~ baseline + Group, data = RATSL8S2) # using the linear model because ANOVA is a type of that
anova(fit)
## Analysis of Variance Table
##
## Response: mean
## Df Sum Sq Mean Sq F value Pr(>F)
## baseline 1 252125 252125 2237.0655 5.217e-15 ***
## Group 2 726 363 3.2219 0.07586 .
## Residuals 12 1352 113
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Anova has some assumptions like linearity between covariante and outcome variable, homogeneity of regression slope, outcome variable should be normal distributed, homoscedasticity and no significant outliers. The outliers we already removed in the step above. The other assumptions we assume they are right. The output shows that the baseline measurements are strongly related to the weight values taken after the weekdays (p<0.001). However it seems that the diet made not big difference between the groups after 64WDs. The p-values are 0.1.
using the BPRS data
library(dplyr)
library(tidyr)
BPRSL <- read.csv("C:/Users/susac/Documents/R/IODS-project/data/BPRSL.csv", sep = ",", header = TRUE, row.names = 1)
glimpse(BPRSL)
## Rows: 360
## Columns: 5
## $ treatment <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
## $ subject <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17...
## $ weeks <chr> "week0", "week0", "week0", "week0", "week0", "week0", "we...
## $ bprs <int> 42, 58, 54, 55, 72, 48, 71, 30, 41, 57, 30, 55, 36, 38, 6...
## $ week <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
dim(BPRSL)
## [1] 360 5
BPRSL$treatment <- factor(BPRSL$treatment) # factor again
BPRSL$subject <- factor(BPRSL$subject)
glimpse(BPRSL)
## Rows: 360
## Columns: 5
## $ treatment <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
## $ subject <fct> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17...
## $ weeks <chr> "week0", "week0", "week0", "week0", "week0", "week0", "we...
## $ bprs <int> 42, 58, 54, 55, 72, 48, 71, 30, 41, 57, 30, 55, 36, 38, 6...
## $ week <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
These data set shows the bprs points of 40 men of 8 weeks. these 40 men were split randomly in 2 groups (treatment 1 and 2) at the beginning of the study.
visualization
ggplot(BPRSL, aes(x = week, y = bprs, color = subject)) +
geom_line() +
scale_linetype_manual(values = rep(1:10, times=4)) +
facet_grid(. ~ treatment, labeller = label_both) +
theme(legend.position = "none") +
scale_y_continuous(limits = c(min(BPRSL$bprs), max(BPRSL$bprs)))
We can see that almost all bprs values decrease over the 8 weeks and that higher bprs scores at the beginning have usually also higher scores at the end.
Linear Regression Model
BPRS_reg<-lm(bprs~week+treatment, BPRSL)
summary(BPRS_reg)
##
## Call:
## lm(formula = bprs ~ week + treatment, data = BPRSL)
##
## Residuals:
## Min 1Q Median 3Q Max
## -22.454 -8.965 -3.196 7.002 50.244
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 46.4539 1.3670 33.982 <2e-16 ***
## week -2.2704 0.2524 -8.995 <2e-16 ***
## treatment2 0.5722 1.3034 0.439 0.661
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 12.37 on 357 degrees of freedom
## Multiple R-squared: 0.1851, Adjusted R-squared: 0.1806
## F-statistic: 40.55 on 2 and 357 DF, p-value: < 2.2e-16
In this model we ignore the repeated measures structure. Also we assume that the repeated measures are independent. The output of the linear regression model shows that averaged the bprs scores decres around 2 points per week. This results is significant with a p-value of <0.001. It seems that the different treatment have no significant influence on the bprs scores (p-vale of treatment 2 =0.661).
Random Intercept Model
library(lme4)
## Warning: package 'lme4' was built under R version 4.0.3
## Loading required package: Matrix
##
## Attaching package: 'Matrix'
## The following objects are masked from 'package:tidyr':
##
## expand, pack, unpack
BPRS_ref <- lmer(bprs ~ week + treatment + (1 | subject), data = BPRSL, REML = FALSE)
summary(BPRS_ref)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: bprs ~ week + treatment + (1 | subject)
## Data: BPRSL
##
## AIC BIC logLik deviance df.resid
## 2748.7 2768.1 -1369.4 2738.7 355
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.0481 -0.6749 -0.1361 0.4813 3.4855
##
## Random effects:
## Groups Name Variance Std.Dev.
## subject (Intercept) 47.41 6.885
## Residual 104.21 10.208
## Number of obs: 360, groups: subject, 20
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 46.4539 1.9090 24.334
## week -2.2704 0.2084 -10.896
## treatment2 0.5722 1.0761 0.532
##
## Correlation of Fixed Effects:
## (Intr) week
## week -0.437
## treatment2 -0.282 0.000
Random Intercept and Random Slope Model
BPRS_ref1 <- lmer(bprs ~ week + treatment + (week | subject), data = BPRSL, REML = FALSE)
summary(BPRS_ref1)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: bprs ~ week + treatment + (week | subject)
## Data: BPRSL
##
## AIC BIC logLik deviance df.resid
## 2745.4 2772.6 -1365.7 2731.4 353
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.8919 -0.6194 -0.0691 0.5531 3.7976
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## subject (Intercept) 64.8222 8.0512
## week 0.9609 0.9802 -0.51
## Residual 97.4305 9.8707
## Number of obs: 360, groups: subject, 20
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 46.4539 2.1052 22.066
## week -2.2704 0.2977 -7.626
## treatment2 0.5722 1.0405 0.550
##
## Correlation of Fixed Effects:
## (Intr) week
## week -0.582
## treatment2 -0.247 0.000
anova(BPRS_ref1, BPRS_ref)
## Data: BPRSL
## Models:
## BPRS_ref: bprs ~ week + treatment + (1 | subject)
## BPRS_ref1: bprs ~ week + treatment + (week | subject)
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## BPRS_ref 5 2748.7 2768.1 -1369.4 2738.7
## BPRS_ref1 7 2745.4 2772.6 -1365.7 2731.4 7.2721 2 0.02636 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Random Intercept and Random Slope Model with interaction
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: bprs ~ week * treatment + (week | subject)
## Data: BPRSL
##
## AIC BIC logLik deviance df.resid
## 2744.3 2775.4 -1364.1 2728.3 352
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.0512 -0.6271 -0.0768 0.5288 3.9260
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## subject (Intercept) 64.9964 8.0620
## week 0.9687 0.9842 -0.51
## Residual 96.4707 9.8220
## Number of obs: 360, groups: subject, 20
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 47.8856 2.2521 21.262
## week -2.6283 0.3589 -7.323
## treatment2 -2.2911 1.9090 -1.200
## week:treatment2 0.7158 0.4010 1.785
##
## Correlation of Fixed Effects:
## (Intr) week trtmn2
## week -0.650
## treatment2 -0.424 0.469
## wek:trtmnt2 0.356 -0.559 -0.840
## Data: BPRSL
## Models:
## BPRS_ref1: bprs ~ week + treatment + (week | subject)
## BPRS_ref2: bprs ~ week * treatment + (week | subject)
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## BPRS_ref1 7 2745.4 2772.6 -1365.7 2731.4
## BPRS_ref2 8 2744.3 2775.4 -1364.1 2728.3 3.1712 1 0.07495 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Amazing, how short 6 weeks can be